[WIP] Stream during snapshot #409

rkistner · 2025-11-20T12:39:32Z

Background

Currently, the replication process is effectively linear / "single threaded". When a new sync rules version is deployed, we create a new replication stream, which performs a snapshot on each table, then starts streaming. This has a couple of limitations:

Replication is slower than it needs to be due to not being able to replicate tables concurrently.
After the initial table snapshots are complete, there could be a significant replication lag that we need to catch up on.

The changes here are also part of the bigger project to implement differential sync rule updates - only re-replicating for changed bucket definitions / sync stream definitions. Part of that requires switching to a single replication stream for all copies of sync rule versions, and this builds the base to implement that.

Changes to storage implementation

Firstly, this changes the underlying BucketStorageBatch implementations to be safe under concurrent usage. Right now it is still designed to only have one process doing streaming replication with commits at a time, but it is safe to have multiple other processes doing snapshots concurrently.

To implement this, we reduce the reliance on local state, instead using the database state which is safe for concurrent access. While this does not introduce any new model fields yet, we do now rely more strongly on snapshot_done (keeps track of whether or not we're waiting for any snapshots to complete) and keepalive_op (keeps track of ops persisted but not committed yet). This has the specific implication that new slots always need an explicit markAllSnapshotDone() call - this is not done automatically anymore.

This also removes the implementation difference between keepalive(lsn) and commit(lsn) - these now both do the same thing.

Changes to replication

Starting with Postgres, we now start streaming changes immediately when starting replication, even if a snapshot is required. To avoid consistency issues, we:

Ignore any rows picked up by streaming replication when we do a snapshot.
Implement soft-deletes to similarly prioritize deletes in the replication stream over rows in the table snapshot [PENDING].

This also splits out the snapshot implementation from the streaming replication implementation. The snapshotter keeps a queue of tables to snapshot. Currently it only snapshots one at a time, but we can change this in the future.

Tasks

Apply the same changes to Postgres bucket storage.
Re-implement the createEmptyCheckpoints logic.
Get all current tests passing.
Implement soft deletes, and add tests for that edge case.
Add low-level tests for concurrent storage usage.
Proper "job management" for the various replication tasks.
Docs.
MongoDB and MySQL implementations (future PR).

changeset-bot · 2025-11-20T12:39:35Z

⚠️ No Changeset found

Latest commit: ac790a7

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

rkistner added 13 commits November 17, 2025 14:44

Refactor replication connections.

fda46b6

Start streaming concurrently - lets see what breaks.

7a812ee

Split out snapshot logic from streaming logic.

be3390a

Quick workaround for now.

fcd539d

Fix test issues.

7701a9a

WIP.

7e3b0d1

Split out "snapshot done" check.

98da657

Tweaks to tests.

db5d578

Fix for postgres storage.

b7a5d5f

Refactor for snapshotting.

df0d4cb

Refactor commit logic to better handle concurrent replication.

ce500a1

Cover case of no tables to snapshot.

f5ec031

Implement missing method for Postgres storage.

049b32a

rkistner added 7 commits November 20, 2025 16:26

Port changes to postgres storage.

531e887

Fix data storage tests - markSnapshotComplete is required.

64e2abc

Fix storage sync tests.

333bdad

Fix snapshot_lsn handling.

29f4302

Fix snapshot_lsn in postgres storage.

cd19ba2

Fix test promise handling.

8dc726f

Fix more tests.

ac790a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Stream during snapshot #409

[WIP] Stream during snapshot #409

Uh oh!

rkistner commented Nov 20, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Stream during snapshot #409

Are you sure you want to change the base?

[WIP] Stream during snapshot #409

Uh oh!

Conversation

rkistner commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes to storage implementation

Changes to replication

Tasks

Uh oh!

changeset-bot bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rkistner commented Nov 20, 2025 •

edited

Loading

changeset-bot bot commented Nov 20, 2025 •

edited

Loading